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Abstract 

As an example of the recently-introduced concept of rate of innovation, signals that are linear 
combinations of a finite number of Diracs per unit time can be acquired by linear filtering followed by 
uniform sampling. However, in reality, samples are rarely noiseless. In this paper, we introduce a novel 
stochastic algorithm to reconstruct a signal with finite rate of innovation from its noisy samples. Even 
though variants of this problem has been approached previously, satisfactory solutions are only available 
for certain classes of sampling kernels, for example kernels which satisfy the Strang-Fix condition. In this 
paper, we consider the infinite- support Gaussian kernel, which does not satisfy the Strang-Fix condition. 
Other classes of kernels can be employed. Our algorithm is based on Gibbs sampling, a Markov chain 
Monte Carlo (MCMC) method. Extensive numerical simulations demonstrate the accuracy and robustness 
of our algorithm. 
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I. Introduction 

The celebrated Nyquist-Shannon sampling theorem HI, |[2||1 states that a signal x{t) known to be 
bandlimited to l^max is uniquely determined by samples of x{t) spaced l/(2r2jnax) apart. The textbook 
reconstruction procedure is to feed the samples as impulses to an ideal lowpass (sine) filter. Furthermore, 
if x{t) is not bandlimited or the samples are noisy, introducing pre-filtering by the appropriate sine 
sampling kernel gives a procedure that finds the orthogonal projection to the space of ilmax-bandlimited 
signals. Thus the noisy case is handled by simple, linear, time-invariant processing. 

Sampling has come a long way since the sampling theorem, but until recently the results have mostly 
applied only to signals contained in shift-invariant subspaces llH. Moving out of this restrictive setting, 
Vetterli et al. Q showed that it is possible to develop sampling schemes for certain classes of non- 
bandlimited signals that are not subspaces. As described in |5|, for reconstruction from samples it is 
necessary for the class of signals to have finite rate of innovation (FRI). The paradigmatic example is 
the class of signals expressed as 



{t) = ckHt - tk) 



X 

k 



where (p{t) is some known function. For each term in the sum, the signal has two real parameters Ck and 
tk- If the density of t^s (the number that appear per unit of time) is finite, the signal has FRI. It is shown 
constructively in |5| that the signal can be recovered from (noise-less) uniform samples of x(t) * h{t) 
(at a sufficient rate) when * h{t) is a sine or Gaussian function. Results in 161 are based on similar 
reconstruction algorithms and greatly reduce the restrictions on the sampling kernel h{t). 

In practice, though, acquisition of samples is not a noiseless process. For instance, an analog-to-digital 
converter (ADC) has several sources of noise, including thermal noise, aperture uncertainty, comparator 
ambiguity, and quantization Q. Hence, samples are inherently noisy. This motivates our central question: 
Given the signal model (i.e. a signal with FRI) and the noise model, how well can we approximate the 
parameters that describe the signal? In this work, we address this question and develop a novel algorithm 
to reconstruct the signal from the noisy samples, which we will denote y[n\ (see Fig. [T]). 

A. Related Work and Motivation 

Signals with FRI were initially introduced by Vetterli et al. f5\. The reconstruction schemes hinged on 
identifying algebraically-independent parameters of the signals, e.g. the weights {cfe} and time locations 

'a more expansive term could be the Whittaker-Nyquist-Kotelnikov-Shannon sampling theorem; see, e.g., (3), (4|. 
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Fig. 1. Block diagram showing our problem setup. x{t) is a signal with FRI given by (TJ and h(t) is the Gaussian filter with 
width ah given by e[n] is i.i.d. Gaussian noise with standard deviation and y[n] are the noisy samples. From y[n] we 
will estimate the parameters that describe x{t), namely {ck,tk}^^i, and a^, the standard deviation of the noise. 



{tk}- In the seminal paper on FRI, the sampUng kernel for finite signals was chosen to be either the sine 
or the Gaussian. An annihilating filter approach led to an elegant algebraic solution via polynomial root 
finding and least squares. The authors alluded to the noisy case and suggested the use of the Singular 
Value Decomposition (SVD) for dealing with noisy samples. We will show that, in fact, this method is 
ill-conditioned because root-finding is itself not at all robust to noise. Thus it is not amenable to practical 
implementations, for instance on an ADC. 

Subsequently, Dragotti et al. |6| examined acquisition of the same signals with an eye toward im- 
plementability of the sampling kernel. Instead of using the sine and Gaussian kernels (which do not 
have compact support), the authors limited the choice of kernels to functions satisfying the Strang-Fix 
conditions {e.g. splines and scaling functions), exponential splines 191 and functions with rational 
Fourier transforms. They combined the moment-sampling and annihilating filter approaches to solve for 
the parameters. In our work, however, we will continue to use the Gaussian as our sampling kernel. 
We believe that, even though the Gaussian has infinite support, it can be well approximated by its 
truncated version. Hence, we can still draw insights from the analysis of using Gaussian filters and 
the subsequent reconstruction of the signal from its noisy samples y[n\. More importantly, unlike with 
previous approaches, the sampling kernel plays no fundamental role in the reconstruction algorithm. We 
use the Gaussian kernel because of its prominence in earlier work and the intuitiveness of its information 
spreading properties. 

Maravic and Vetterli [l^ and Ridolfi et al. ifTTI proposed and solved a related problem. Instead 
of modeling the noise at the output, they considered the scenario where x{t), the signal in question, is 
corrupted by additive white noise e{t). Clearly, Xf>{t) = x{t) + e{t) does not belong to the class of signals 
with FRI. However, in 111 01 . novel algebraic/subspace-based approaches solve the sampling problem in 
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the Laplace domain and these methods achieve some form of optimaUty. In lOTI . various algorithms 
including subspace -based approaches |[T2l (ESPRIT and MUSIC) as well as multidimensional search 
methods were used and comparisons were made. The authors concluded that, in the noisy signal case, 
the parameters can be recovered at a rate below that prescribed by the Shannon-Nyquist Theorem but at 
a factor above the critical rate. 

B. Our Contributions 

In our paper, we solve a different problem. We model the noise as additive noise to the acquired samples 
y[n\, not the signal x{t). Besides, we use the Gaussian sampling kernel and show that the ill-conditioning 
of the problem can be effectively circumvented. We demonstrate that under these conditions, we are able 
to estimate the parameters via a fully Bayesian approach based on Gibbs sampling (GS) |fT3l . |[T4ll . The 
prior methods are essentially algebraic while our algorithm is stochastic. As such, the maximization of 
the log-likelihood function, which we will derive in Section |llll is robust to initialization. 

More importantly, our algorithm is not constrained to work on the Gaussian kernel. Any kernel can 
be employed because the formulation of the Gibbs sampler does not depend on the specific form of 
the kernel h{t). Finally, all the papers mentioned failed to estimate the standard deviation of the noise 
process Ue. We address this issue in this paper. 

C. Paper Organization 

The rest of this paper is organized as follows: In Section |lll we will formally state the problem 
and define the notation to be used in the rest of the paper. We proceed to delineate our algorithm: 
a stochastic optimization procedure based on Gibbs sampling, in Section |llll We report the results of 
extensive numerical experiments in Section |IVl In Section |IVj we will also highlight some of the main 
deficiencies in [51, which motivate the need for new algorithms for recovering the parameters of a signal 
with FRI given noisy samples y[n]. We conclude our discussion in Section IVl and provide directions for 
further research. 

II. Problem Definition and Notation 

The basic setup is shown in Fig. [T] As mentioned in the introduction, we consider a class of signals 
characterized by a finite number of parameters. In this paper, similar to lH, ifTOl . 161 . the class is the 
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weighted sum of K Diracj^ 

it) = Y,ckS{t-tk)- (1) 



K 

k=l 



The signal to be estimated x{t) is filtered using a Gaussian low -pass filter 

h{t) = exp (-^) (2) 

with width ah to give the signal z{t). Even though h{t) does not have compact support, it can be well 
approximated by a truncated Gaussian, which does have compact support. The filtered signal z{t) is 
sampled at rate of 1/T seconds to obtain z[n] = z{nT) for n = 0, 1, . . . , — 1. Finally, additive white 
Gaussian noise (AWGN) e[n] is added to z[n\ to give y[n\. Therefore, the whole acquisition process from 
x{t) to {y[n]}^^Q can be represented by the model ^A 



M : y[n] = ^ exp + e[n] (3) 



k=l ^ ri 

for n = 0, 1, . . . , A'^ — 1. The amount of noise added is a function of df.. We define the signal-to-noise 
ratio (SNR) in dB as 

SNR ^ lOlog.o { J^f=ol^NP \ 

VEn=o I^N - ?/NIV 

In the sequel, we will use boldface to denote vectors. In particular, 

y = [y[0], y[l], y[A^-l]]T, (4) 

C = [Ci, C2, . . . , CkY, (5) 

t = [ti, t2, . . . , IkV . (6) 

We will sometimes use 6 = {c,t,cre} to denote the complete set of decision variables. We will be 
measuring the performance of our reconstruction algorithms by using the normalized reconstruction error 

A r^\Zest{t) - Z{t)? dt 

where Zest{t) is the reconstructed version of z{t). By construction £ > and the closer £ is to 0, the 
better the reconstruction algorithm. In sum, the problem can be summarized as: Given y = {y[n\ \ n = 
0, ... ,N — 1} and the model Ai, estimate the parameters {ck,tk}k=i minimize £. Also estimate the 
noise variance al- 

^The use of a Dirac delta simplifies the discussion. It can be replaced by a known pulse g{t) and then absorbed into the 
sampling kernel h{t), yielding an effective sampling kernel g{t) * h{t). 
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III. Presentation of the Gibbs sampler 

In this section, we will describe the stochastic optimization procedure based on Gibbs sampling to 
estimate 6 = {c,t,ae}- 

A. Gibbs Sampling (GS) 

Markov chain Monte Carlo (MCMC) in the form of the Gibbs sampler, and the Metropolis-Hastings 
algorithm allows any distribution to be simulated on a finite-dimensional state space specified by any 
conditional density. The Gibbs sampler was first studied by the statistical physics community |[T5l and 
then later in the statistics community |[T3l . |[T6l . fTT l. The basis for Gibbs sampling is the Hammersley- 
Clifford theorem ifTSl which states that given the data y, the conditional densities Pi{0i\6^j^ij,y, Ai) 
contain sufficient information to produce samples from the joint density p{6\y,A4). Furthermore, the 
joint density can be directly derived from the conditional densities. 

Gibbs sampling has been used extensively and successfully in image |[T3l and audio restoration 1141 . 
The Gibbs sampler is presented here to estimate 6 = {c, t,cre}. To simulate our Gibbs sampler, we use 
the i.i.d. Gaussian noise assumption and the model in (O to express the log-likelihood of the parameters 
given the observations as: 

logp{c,t,ae\y,M) 
oc -(iV + l)log(ae) 



N-l 



n=0 



K 
k=\ 



2 



(8) 



A Jeffrey's (improper) non-informative prior has been assigned to the standard deviation of the noise 
such that 

Vipe) oc — . (9) 

In the Gibbs sampling algorithm, as soon as a variate is drawn, it is inserted immediately into the 
conditional p.d.f. and it remains there until being substituted in the next iteration. This is shown in the 
following algorithmic 

Require: y, 1, 4, 0^ = {c^, t^, ct^°)} 
for i ^ 1 : / + If, do 

'For brevity, the dependence on the model M is omitted from the conditional density expressions. 
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cf^r.p^c^\ct'\ct'\■■■,4-'\t(^-'^at'\y) 
c« ~ p(c2|c« , cf^ . . . , cr ^ t(-i)ar^ y) 

c«~p(c,dc?,c«,...,c»_,,t(-i),ar^\y) 

^ Ti^f 

h ~ Pl*2|C^ ,t3 ,0-e ,yj 

4^~p(^x|c«,^^4^•••,4U>^^'^y) 

a«~p(ae|c«,t«,y) 
end for 

Compute ^MMSE using ([T0| 
return ^mmse 

In the algorithm, -d ~ p(-) means that i9 is a random draw from p{-). The superscript number (z) 
denotes the current iteration. After iteration^ the Markov chain approximately reaches its stationary 
distribution p{6\y,M). Minimum mean squared error (MMSE) estimates can then be approximated by 
taking averages of the samples from the next / iterations {6^^''~^^\ 6^^''^'^\ . . . , i.e.. 



^MMSE = j ep{e\y,M)de^j J2 ^ 



h+i 



(10) 



i=h+l 

B. Presentation of the Posterior Densities in the GS 

We will now derive the conditional densities. In the sequel, we will use the notation 0_i to denote the 
set of parameters excluding the ^th parameter. It follows from Bayes' theorem that 

p{ee\0-i,y,M)^p{y\e,M)p{e). (11) 

Thus, the required conditional distributions are proportional to the likelihood of the data times the priors 
on the parameters. The likelihood function of y given the model is given in ([8]l from the Gaussian noise 
assumption. Thus, we can calculate the posterior distributions of the parameters given the rest of the 
parameters. The parameters conditioned on are taken as constant and can be left out of the posterior. We 
will sample from these posterior densities in the GS iterations as shown in the above algorithm. 

We will now proceed to present the posterior densities. The derivations are provided in the Appendix. 

"^Ib is also commonly known as the burn-in period in the Gibbs sampling and MCMC literature 1141 . 
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1) Sampling Ck-' Ck is sampled from a Gaussian distribution given by 

Pk 1 



p{ck\e-c„y,M) =Af{ck; 



where 



JV-l 



Pk 



exp 



— y 

1 



2ak 2ak 
{nT - tkf 



crt 



cr. 



n=0 



K 



E Ck' exp 



k'=l 
[k'j^k 



{nT - tkf 



y[n\ 



It is easy to sample from Gaussian densities when the parameters {ak,/3k) have been determined. 
2) Sampling tk-' tk is sampled from a distribution of the form 



cx exp 



—y 



Tfcexp 



{nT - tkf 



crt 



+ i^k exp 



{nT - tkf 



where 



Ik 



Vk 



— c|, 



K 

y^ Ck' exp 

k'=l 
Ik'^k 



{nT - tk' 



y[n\ 



(12) 



(13) 



(14) 



(15) 

(16) 
(17) 



It is not straightforward to sample from this distribution. We can sample tk from a uniform grid of discrete 
values with probability masses proportional to (fTSl ). But in practice, and for greater accuracy, we used 
rejection sampling |[T9ll . ll20ll to generate samples from p{tk\0-t^,y,A4). The proposal distribution 
q{tk) was chosen to be an appropriately scaled Gaussian, since it is easy to sample from Gaussians. This 
is shown in the following algorithm. 



Require: p{tk) = p{tk\e^t,,y,M) 

Select q{tk) ~ Af and c s.t. p{tk) < cq{tk) 
n ~Z^(0,1) 
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Hist of c* ' after convergence Hist of t'^ ' after convergence 

200 1 ■ ■ ' 





9.5 10 

True value of c^ = 1 True value of t, = 9 

(a) Histogram of the samples of c'*' (b) Histogram of the samples of fj*' 

Fig. 2. Note that the variance of the stationary distribution of the tkS is smaller than that of the c^s after convergence of the 
Markov chain. 



repeat 

tk ~ q{tk) 

until u < p{tk) / {cq{tk)) 

3) Sampling a^: o"e is sampled from the 'Square -root Inverted-Gamma' lIlTI distribution X^^^/^((Je; y?, A 



p{ae\6-^^,y,M) 



-(2</,+l) 



exp I - A 



^[0,+oo 



where 



N 
2"' 



A 1 



y[n\ 



K 

E 

fe=i 



Cfcexp 



{nT - tkf 



(18) 

(19) 
(20) 



Thus the distribution of the variance of the noise a1 is Inverted Gamma, which corresponds to the 
conjugate prior of cjg in the expression of N{e\Q,a1) fT\\ and thus it is easy to sample from. In our 
simulations, we sampled from this density using the Matlab function gamrnd and applied the 'Inverted 
Square-root' transformation 

C. Further Improvements via Linear Least Squares Estimation 

We can perform an additional post-processing step to improve on the estimates of Ck- We noted from 
our preliminary experiments (see Fig.|2l) that the variance of the stationary distribution of the t^ts is smaller 



X follows a 'Square-root Inverted-Gamma' distribution if X follows a Gamma distribution. 
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Param. 


K 




TV 


SNR 


Value 


5 


and 10"'' 


30 


oo and 137 dB 



TABLE I 

Parameter values for demonstration of the annihilating filter and root-finding algorithm. 



than that of the c^s. This results in better estimates for the locations t^s as compared to the magnitudes 
CfcS. Now, we observe that y[n], the observations, are linear in the CfcS once the tfcS are known. A natural 
extension to our GS algorithm is to augment our Ck estimates with a linear least squares estimation 
(LLSE) procedure using y and the MMSE estimates of tk- Eqn. ^ can be written as 

K 

y[n]=^Ckh{nT -tk)+e[n\, < n < iV - 1 (21) 
fc=i 

with h{t), the Gaussian sampling kernel, given in Q. Given the set of estimates of the time locations 
{^k}k=v '^^'^ rewrite (|2T] ) as a matrix equation, giving 

y = He + e, 

where [H]„fc = h{nT — tk) and l<n<A^, 1<A;<K. We now minimize the square of the residual 
||e|p = 1 1 He — yp, giving the normal equations H^He = H^y and the least squares solution |22| 

CLs = (HTH)-iHTy (22) 

From our experiments, we found that, in general, using cls as estimates for the magnitudes of the 
impulses provided a lower reconstruction error £. 

IV. Numerical Results and Experiments 

In this section, we will first review the annihilating filter and root-finding method algorithm for solving 
for the parameters of a signal with FRI. This algorithm provides a baseline for comparison. Then we will 
provide extensive simulation results to validate the accuracy of the algorithm we proposed in Section [nil 

A. Problems with Annihilating Filter and Root-Finding 

In in, Vetterli et al. introduced the concept of a class of signals with a finite rate of innovation. For 
signals of the form ([T|l and certain sampling kernels, the annihilating filter was used as a means to locate 
the tk values. Subsequently a least squares approach yielded the weights Ck- It was shown that in the 
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tr =0, E = 4.49e-10 



o = 1e-6, E = 0.2721 






z(t) 


1 \ A " 
1 \f\ 1 \'A 

/ / ' ' 





-10 10 20 



(a) The annihilating filter approach reconstructs the 
signal exactly in the noiseless scenario. 



(b) The reconstruction completely breaks down 
when noise of a small standard deviation = 10~® 
(SNR=137 dB) is added. 



Fig. 3. Demonstration of the annihilating filter/root-finding approach. 



noiseless scenario, this method recovers the parameters exactly (see Fig. Oa)). For completeness, we will 
briefly outline their method here. Denoting the noiseless samples by z[n\, ^ can be written as 

K 

p[n]=^akul, n = 0,l, N -1, (23) 

k=l 

with the identifications 

p[n] = exp(n2rV(2a^))z[n], (24) 

ak = Cfcexp(-t|/(2cr^)), (25) 

Uk = expitkT/al). (26) 

Now, since p[n\ is a linear combination of exponentials, we find the annihilating filter a[n\ such that 

K 

a[n] * p[n] 

= a[e]p[n - ^] = 0, Vn G Z. 

This can be written in matrix/vector form as Pa = 0. This system will admit a solution when rank(P) = 
K. In practice this is solved using an SVD where P = USV-^ and a = Yex+i and ca'+i is a length- 
(K + 1) vector with 1 in position {K + 1) and elsewhere. Now, once the coefficients a[n\ are found, 
the values Uk axe simply the roots of the filter 

K 

A{z)=Ya[n]z--. 

n=0 

The tfcS can then be determined from ( [25] ) and the solution for the c/^s essentially parallels the development 
in Section HITCl 
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In the same paper, it was suggested that to deal with the noisy samples, we can minimize ||Pa||, in 
which case, a is the eigenvector that corresponds to the smallest eigenvalue of P^P. Here, we argue that 
this method is inherently ill-conditioned and thus not robust to noise. 

1) Firstly, minimizing ||Pa|| involves finding the eigenvector vi that corresponds to the largest eigen- 
value Ai- Because computing eigenvalues and eigenvectors are essentially root-finding operations, 
this is ill-conditioned. 

2) Secondly, even if the vector a = vi can be found, the zeros of the filter A{z) have to be found. 
This again involves root finding, which is ill-conditioned. 

3) Finally, from (l24l) . any noise added to z[n\ will be exponentially weighted in the observations p[n\. 
We feel that this is the greatest source of ill-conditioning. 

Because of the three reasons highlighted above, there is a need to explore new algorithms for finding the 
parameters. In Fig.[3l we show a simulation with the parameters as tabulated in Table Jl but we varied the 
noise (cTg = 10~® gives SNR = 137 dB, a very low noise level). We observe from Fig. |3(b)| that (without 
oversampling) the annihilating filter and root-finding method is not robust even when a miniscule amount 
of noise is added. 

Remark The root-finding method is so unstable that, at times, even for low levels of noise, we obtain 
complex roots for the locations {tk}j^SQ. To solve this problem, we orthogonally projected the polynomial 
described by the filter coefficients a[n\ to the closest polynomial that belongs to the space of polynomials 
with real roots only. 

B. Performance of our GS Algorithm 

Clearly, the annihilating filter/root-finding algorithm is not robust to noise. We have suggested an 
alternative reconstruction algorithm in Section |nll and in this section, we will present our results on 
several synthetic examples 

1) Initial Demonstration: To demonstrate the evolution the Gibbs sampler, we performed an initial 
experiment and chose the parameters to be those in Table Jl with the exception that the noise standard 
deviation was increased to cje = 2.5, giving an SNR of 10.2 dB. We plot the iterates in Fig. |4l The true 
filtered signal z{t) and its estimate Zest{t) are plotted in Fig. [S] We note the close similarity between 
z{t) and Zest{t). 

'All the code, written in MATLAB, can be found at the first author's homepage |http://web.niit.edu/vtan/frimcmc[ 
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20 40 60 80 100 20 40 60 80 100 

Iteration Iteration 

(c) Evolution of the (Te (d) Reduction of the (negative) log-likelihood 

-logp(c,t,cre \ y,M) 

Fig. 4. Evolution of the GS algorithm. The iterates of the parameters {ck,tk}^=i and tJe are shown. The true values are 
indicated by the broken red lines. In Fig. |4(d)[ we see that the negative log-likelihood converges to the global minimum in fewer 
than 20 iterations for this problem size (K = 5). 



We observe that the sampler converges in fewer than 20 iterations for this run, even though the parameter 
values were initialized far from their optimal values. We emphasize that as GS is essentially a stochastic 
optimization procedure (not unlike Simulated Annealing or Genetic Algorithms), it is insensitive to the 
choice of starting point 6^^\ The Markov Chain is guaranteed to converge to the stationary distribution 
after the burn-in period |[T9l . 

2 ) Further Experiments: To further validate our algorithm, we performed extensive simulations (Expts 
A and B) on two different problem sizes to validate our algorithm. For consistency, each experiment was 
repeated using 100 different random seeds and the means of £ [cf. ([T])] taken. The parameters are 
chosen according to Table |lll The unknown parameters were initialized as c'^'^^ = t^'^) = [0, . . . , 0] and 
= 0.01. The results for Expts A and B are shown in Fig. |7(a)| and |7(b)| respectively. We noted from 
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z(t) 









-5 5 10 15 20 25 



Fig. 5. Comparison between z(t) and Zest(i) using the GS algorithm. For this run, £ = 0.0072. 



Param. 


K 




iV 


SNR 


£ 


Expt A 


5 


1.5:0.25:3.0 


50:25:150 


Fig Hi)] 


Fig|7(l 


Expt B 


10 


3.0:0.50:6.0 


100:50:250 


Fig|6(b)l 


Fig|7(b)| 



TABLE II 

Parameter values for numerical simulations. 



these experiments that: 

• The GS algorithm is insensitive to initialization. It always finds approximately optimal estimates from 
any starting point because the Markov chain provably converges to the stationary distribution |19|. 

• The LLSE post-processing step in the GS algorithm reduces the reconstruction error £. This is a 




a 

e 



(a) SNR (dB) against ae for Expt A (K ^ 5). (b) SNR (dB) against ae for Expt B (K = 10). 

Fig. 6. SNR (dB) against ae for the two experiments. 
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1.5 2 2.5 3 3 3.5 4 4.5 5 5.5 6 



e e 

(a) Errors £ against (Je for Expt A {K = 5). (b) Errors £ against (Te for Expt B (K = 10). 

Fig. 7. Plots of £ against (Je for various oversampling factors and problem sizes. 

consequence of using the (more accurate) t^s from the sampler to estimate the c^s via LLSE, instead 
of using the c^s from the sampler directly. 

• From the two plots in Fig. |7J we observe that, if the problem size doubles (from K = to K = 10), 
with corresponding doubling of {(Je,N), £ remains approximately constant. This insures scalability 
of the algorithm. For example, £{K = 5,cJe = 2.5, = 50) w 8{K = 10, = 5.0, = 100) ^ 
0.045. 

• The noise standard deviation cTg can be estimated accurately in the GS algorithm as shown in 
Fig. |4(c)[ This may be important in some applications. 

To conclude, even though the annihilating filter approach Q is more computationally efficient than our 
algorithms, it is certainly not amenable to scenario where noisy samples are acquired. 

V. Conclusions 

In this paper, we addressed the problem of reconstructing a signal with FRI given noisy samples. We 
showed that it is possible to circumvent some of the problems of the annihilating filter and root-finding 
approach |5]. We introduced the Gibbs sampling algorithm. From the performance plots, we observe that 
GS performs very well as compared to the annihilating filter method, which is not robust to noise. 

Perhaps the most important observation we made is the following: The success of the fully Bayesian 
GS algorithm does not depend on the choice of kernel h{t). The formulation of the GS does not depend 
on the specific form of h{t). In fact, we used a Gaussian sampling kernel to illustrate that our algorithm 
is not restricted to the classes of kernels considered in [HI. 

A natural extension to our work here is to assign structured priors to c, t and cTg. These priors can 
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themselves be dependent on their own set of hyperparameters, giving a hierarchical Bayesian formulation. 
In this way, there would be greater flexibility in the parameter estimation process. We can also seek 
to improve on the computational load of the algorithms introduced here. Another interesting research 
direction is to examine the feasibility of using the subspace-based approaches [10] to solve the problem 
of acquired samples that are noisy. 

A question that remains is: How well can real-world signals (including natural images) be modeled 
as signals with FRI? We believe the answer will have profound ramifications for areas such as sparse 
approximation 1231 and compressed sensing |[24l . 1251 . 



For brevity, we define 



Appendix 

Derivation of the Conditional Densities 



Qnk = h{nT - tk) = exp ( — 



We start from the log-likelihood of the parameters 6 given the data y and model Ai [cf. ([8])]. To 
obtain p{ck\0_c^,y, Ai), we treat the other parameters 0_c^ as constant, giving log p{ck\9 _f.k ^ Y ^ 
proportional to 



^ N-l 



( 



K 



\ 



X] Cfc'5nfc' - y[n\ 



, k'=l 
\k'=/=k 



J 



Comparing this expression in Ck to the Gaussian distribution with mean fi and variance a"^, 

logp(cfc; /i, a^) oc -^(cfc - 

and equating coefficients, we obtain (fT3l ) and (fT4b . The distribution , y, tM) can be obtained 

similarly and is omitted. Finally for the noise standard deviation a"e, 

logp{ae\6-cT,,y,M) oc -{N + l)log(cre) - 

where A is defined in ( |20l ). Taking the antilog on both sides yields 

p{ae\e.„^,y,M) oco--(^+^) exp ^-^^ , 

which is the 'Square-root Inverted-Gamma' distribution with parameters given by ( fT9l ) and (l20l ). AU the 
densities have been derived. 
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